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Data Mining and Homeland Security: An Overview 



Summary 

Data mining has become one of the key features of many homeland security 
initiatives. Often used as a means for detecting fraud, assessing risk, and product 
retailing, data mining involves the use of data analysis tools to discover previously 
unknown, valid patterns and relationships in large data sets. In the context of 
homeland security, data mining can be a potential means to identify terrorist 
activities, such as money transfers and communications, and to identify and track 
individual terrorists themselves, such as through travel and immigration records. 

While data mining represents a significant advance in the type of analytical tools 
currently available, there are limitations to its capability. One limitation is that 
although data mining can help reveal patterns and relationships, it does not tell the 
user the value or significance of these patterns. These types of determinations must 
be made by the user. A second limitation is that while data mining can identify 
connections between behaviors and/or variables, it does not necessarily identify a 
causal relationship. Successful data mining still requires skilled technical and 
analytical specialists who can structure the analysis and interpret the output. 

Data mining is becoming increasingly common in both the private and public 
sectors. Industries such as banking, insurance, medicine, and retailing commonly use 
data mining to reduce costs, enhance research, and increase sales. In the public 
sector, data mining applications initially were used as a means to detect fraud and 
waste, but have grown to also be used for purposes such as measuring and improving 
program performance. However, some of the homeland security data mining 
applications represent a significant expansion in the quantity and scope of data to be 
analyzed. Some efforts that have attracted a higher level of congressional interest 
include the Terrorism Information Awareness (TIA) project (now-discontinued) and 
the Computer-Assisted Passenger Prescreening System n (CAPPS II) project (now- 
canceled and replaced by Secure Flight). Other initiatives that have been the subject 
of congressional interest include the Multi-State Anti-Terrorism Information 
Exchange (MATRIX), the Able Danger program, the Automated Targeting System 
(ATS), and data collection and analysis projects being conducted by the National 
Security Agency (NSA). 

As with other aspects of data mining, while technological capabilities are 
important, there are other implementation and oversight issues that can influence the 
success of a project's outcome. One issue is data quality, which refers to the 
accuracy and completeness of the data being analyzed. A second issue is the 
interoperability of the data mining software and databases being used by different 
agencies. A third issue is mission creep, or the use of data for purposes other than 
for which the data were originally collected. A fourth issue is privacy. Questions 
that may be considered include the degree to which government agencies should use 
and mix commercial data with government data, whether data sources are being used 
for purposes other than those for which they were originally designed, and possible 
application of the Privacy Act to these initiatives. It is anticipated that congressional 
oversight of data mining projects will grow as data mining efforts continue to evolve. 
This report will be updated as events warrant. 
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Data Mining and Homeland Security: 
An Overview 



What Is Data Mining? 

Data mining involves the use of sophisticated data analysis tools to discover 
previously unknown, valid patterns and relationships in large data sets. 1 These tools 
can include statistical models, mathematical algorithms, and machine learning 
methods (algorithms that improve their performance automatically through 
experience, such as neural networks or decision trees). Consequently, data mining 
consists of more than collecting and managing data, it also includes analysis and 
prediction. 

Data mining can be performed on data represented in quantitative, textual, or 
multimedia forms. Data mining applications can use a variety of parameters to 
examine the data. They include association (patterns where one event is connected 
to another event, such as purchasing a pen and purchasing paper), sequence or path 
analysis (patterns where one event leads to another event, such as the birth of a child 
and purchasing diapers), classification (identification of new patterns, such as 
coincidences between duct tape purchases and plastic sheeting purchases), clustering 
(finding and visually documenting groups of previously unknown facts, such as 
geographic location and brand preferences), and forecasting (discovering patterns 
from which one can make reasonable predictions regarding future activities, such as 
the prediction that people who join an athletic club may take exercise classes). 2 

As an application, compared to other data analysis applications, such as 
structured queries (used in many commercial databases) or statistical analysis 
software, data mining represents a difference of kind rather than degree. Many 
simpler analytical tools utilize a verification-based approach, where the user develops 
a hypothesis and then tests the data to prove or disprove the hypothesis. For 
example, a user might hypothesize that a customer who buys a hammer, will also buy 
a box of nails. The effectiveness of this approach can be limited by the creativity of 
the user to develop various hypotheses, as well as the structure of the software being 
used. In contrast, data mining utilizes a discovery approach, in which algorithms can 
be used to examine several multidimensional data relationships simultaneously, 
identifying those that are unique or frequently represented. For example, a hardware 
store may compare their customers' tool purchases with home ownership, type of 



1 Two Crows Corporation, Introduction to Data Mining and Knowledge Discovery, Third 
Edition (Potomac, MD: Two Crows Corporation, 1999); Pieter Adriaans and Dolf Zantinge, 
Data Mining (New York: Addison Wesley, 1996). 

2 For a more technically-oriented definition of data mining, see [http://searchcrm 
.techtarget.com/gDefinition/0,294236,sidl l_gci21 1901,00.html]. 



CRS-2 



automobile driven, age, occupation, income, and/or distance between residence and 
the store. As a result of its complex capabilities, two precursors are important for a 
successful data mining exercise; a clear formulation of the problem to be solved, and 
access to the relevant data. 3 

Reflecting this conceptualization of data mining, some observers consider data 
mining to be just one step in a larger process known as knowledge discovery in 
databases (KDD). Other steps in the KDD process, in progressive order, include data 
cleaning, data integration, data selection, data transformation, (data mining), pattern 
evaluation, and knowledge presentation. 4 

A number of advances in technology and business processes have contributed 
to a growing interest in data mining in both the public and private sectors. Some of 
these changes include the growth of computer networks, which can be used to 
connect databases; the development of enhanced search-related techniques such as 
neural networks and advanced algorithms; the spread of the client/server computing 
model, allowing users to access centralized data resources from the desktop; and an 
increased ability to combine data from disparate sources into a single searchable 
source. 5 

In addition to these improved data management tools, the increased availability 
of information and the decreasing costs of storing it have also played a role. Over the 
past several years there has been a rapid increase in the volume of information 
collected and stored, with some observers suggesting that the quantity of the world' s 
data approximately doubles every year. 6 At the same time, the costs of data storage 
have decreased significantly from dollars per megabyte to pennies per megabyte. 
Similarly, computing power has continued to double every 18-24 months, while the 
relative cost of computing power has continued to decrease. 7 

Data mining has become increasingly common in both the public and private 
sectors. Organizations use data mining as a tool to survey customer information, 
reduce fraud and waste, and assist in medical research. However, the proliferation 
of data mining has raised some implementation and oversight issues as well. These 
include concerns about the quality of the data being analyzed, the interoperability of 
the databases and software between agencies, and potential infringements on privacy. 
Also, there are some concerns that the limitations of data mining are being 
overlooked as agencies work to emphasize their homeland security initiatives. 



3 John Makulowich, "Government Data Mining Systems Defy Definition," Washington 
Technology, 22 February 1999, [http://www.washingtontechnology.com/news/13_22/tech_ 
features/393-3 .html]. 

4 Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques (New York: 
Morgan Kaufmann Publishers, 2001), p. 7. 

5 Pieter Adriaans and Dolf Zantinge, Data Mining (New York: Addison Wesley, 1996), pp. 
5-6. 

6 Ibid., p. 2. 

7 Two Crows Corporation, Introduction to Data Mining and Knowledge Discovery, Third 
Edition (Potomac, MD: Two Crows Corporation, 1999), p. 4. 
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Limitations of Data Mining as a 
Terrorist Detection Tool 

While data mining products can be very powerful tools, they are not self- 
sufficient applications. To be successful, data mining requires skilled technical and 
analytical specialists who can structure the analysis and interpret the output that is 
created. Consequently, the limitations of data mining are primarily data or personnel- 
related, rather than technology-related. 8 

Although data mining can help reveal patterns and relationships, it does not tell 
the user the value or significance of these patterns. These types of determinations 
must be made by the user. Similarly, the validity of the patterns discovered is 
dependent on how they compare to "real world" circumstances. For example, to 
assess the validity of a data mining application designed to identify potential terrorist 
suspects in a large pool of individuals, the user may test the model using data that 
includes information about known terrorists. However, while possibly re-affirming 
a particular profile, it does not necessarily mean that the application will identify a 
suspect whose behavior significantly deviates from the original model. 

Another limitation of data mining is that while it can identify connections 
between behaviors and/or variables, it does not necessarily identify a causal 
relationship. For example, an application may identify that a pattern of behavior, 
such as the propensity to purchase airline tickets just shortly before the flight is 
scheduled to depart, is related to characteristics such as income, level of education, 
and Internet use. However, that does not necessarily indicate that the ticket 
purchasing behavior is caused by one or more of these variables. In fact, the 
individual's behavior could be affected by some additional variable(s) such as 
occupation (the need to make trips on short notice), family status (a sick relative 
needing care), or a hobby (taking advantage of last minute discounts to visit new 
destinations). 9 

Beyond these specific limitations, some researchers suggest that the 
circumstances surrounding our knowledge of terrorism make data mining an ill- 
suited tool for identifying (predicting) potential terrorists before an activity occurs. 
Successful "predictive data mining" requires a significant number of known instances 
of a particular behavior in order to develop valid predictive models. For example, 
data mining used to predict types of consumer behavior (i.e., the likelihood of 
someone shopping at a particular store, the potential of a credit card usage being 
fraudulent) may be based on as many as millions of previous instances of the same 
particular behavior. Moreover, such a robust data set can still lead to false positives. 
In contrast, as a CATO Institute report suggests that the relatively small number of 
terrorist incidents or attempts each year are too few and individually unique "to 
enable the creation of valid predictive models." 10 



8 Ibid., p. 2. 

9 Ibid., p. 1. 

10 Jeff Jonas and Jim Harper, Effective Counterterrorism and the Limited Role of Predictive 

(continued...) 
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Data Mining Uses 

Data mining is used for a variety of purposes in both the private and public 
sectors. Industries such as banking, insurance, medicine, and retailing commonly use 
data mining to reduce costs, enhance research, and increase sales. For example, the 
insurance and banking industries use data mining applications to detect fraud and 
assist in risk assessment (e.g., credit scoring). Using customer data collected over 
several years, companies can develop models that predict whether a customer is a 
good credit risk, or whether an accident claim may be fraudulent and should be 
investigated more closely. The medical community sometimes uses data mining to 
help predict the effectiveness of a procedure or medicine. Pharmaceutical firms use 
data mining of chemical compounds and genetic material to help guide research on 
new treatments for diseases. Retailers can use information collected through affinity 
programs (e.g., shoppers' club cards, frequent flyer points, contests) to assess the 
effectiveness of product selection and placement decisions, coupon offers, and which 
products are often purchased together. Companies such as telephone service 
providers and music clubs can use data mining to create a "churn analysis," to assess 
which customers are likely to remain as subscribers and which ones are likely to 
switch to a competitor. 11 

In the public sector, data mining applications were initially used as a means to 
detect fraud and waste, but they have grown also to be used for purposes such as 
measuring and improving program performance. It has been reported that data 
mining has helped the federal government recover millions of dollars in fraudulent 
Medicare payments. 12 The Justice Department has been able to use data mining to 
assess crime patterns and adjust resource allotments accordingly. Similarly, the 
Department of Veterans Affairs has used data mining to help predict demographic 
changes in the constituency it serves so that it can better estimate its budgetary needs. 
Another example is the Federal Aviation Administration, which uses data mining to 
review plane crash data to recognize common defects and recommend precautionary 
measures. 13 

In addition, data mining has been increasingly cited as an important tool for 
homeland security efforts. Some observers suggest that data mining should be used 
as a means to identify terrorist activities, such as money transfers and 



10 (...continued) 

Data Mining, CATO Institute Policy Analysis No. 584, December 11, 2006 p. 8, 
[http://www.cato.org/pubs/pas/pa584.pdfj. 

1 1 Two Crows Corporation, Introduction to Data Mining and Knowledge Discovery, Third 
Edition (Potomac, MD: Two Crows Corporation, 1999), p. 5; Patrick Dillon, Data Mining: 
Transforming Business Data Into Competitive Advantage and Intellectual Capital (Atlanta 
GA: The Information Management Forum, 1998), pp. 5-6. 

12 George Cahlink, "Data Mining Taps the Trends," Government Executive Magazine, 
October 1, 2000, [http://www.govexec.com/tech/articles/1000managetech.htm]. 

13 Ibid.; for a more detailed review of the purpose for data mining conducted by federal 
departments and agencies, see U.S. General Accounting Office, Data Mining: Federal 
Efforts Cover a Wide Range of Uses, GAO Report GAO-04-548 (Washington: May 2004). 
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communications, and to identify and track individual terrorists themselves, such as 
through travel and immigration records. Initiatives that have attracted significant 
attention include the now-discontinued Terrorism Information Awareness (TIA) 
project 14 conducted by the Defense Advanced Research Projects Agency (DARPA), 
and the now-canceled Computer-Assisted Passenger Prescreening System II (C APPS 
II) that was being developed by the Transportation Security Administration (TSA). 
C APPS II is being replaced by a new program called Secure Flight. Other initiatives 
that have been the subject of congressional interest include the Able Danger program 
and data collection and analysis projects being conducted by the National Security 
Agency (NSA). 

Terrorism Information Awareness (TIA) Program 

In the immediate aftermath of the September 11, 2001, terrorist attacks, many 
questions were raised about the country's intelligence tools and capabilities, as well 
as the government's ability to detect other so-called "sleeper cells," if, indeed, they 
existed. One response to these concerns was the creation of the Information 
Awareness Office (IAO) at the Defense Advanced Research Projects Agency 
(DARPA) 15 in January 2002. The role of IAO was "in part to bring together, under 
the leadership of one technical office director, several existing DARPA programs 
focused on applying information technology to combat terrorist threats." 16 The 
mission statement for IAO suggested that the emphasis on these technology programs 
was to "counter asymmetric threats by achieving total information awareness useful 
for preemption, national security warning, and national security decision making." 17 
To that end, the TIA project was to focus on three specific areas of research, 
anticipated to be conducted over five years, to develop technologies that would assist 
in the detection of terrorist groups planning attacks against American interests, both 
inside and outside the country. The three areas of research and their purposes were 
described in a DOD Inspector General report as: 



14 This project was originally identified as the Total Information Awareness project until 
DARPA publicly renamed it the Terrorism Information Awareness project in May 2003. 

Section 8131 of the FY2004 Department of Defense Appropriations Act (P.L. 108-87) 
prohibited further funding of TIA as a whole, while allowing unspecified subcomponents 
of the TIA initiative to be funded as part of DOD's classified budget, subject to the 
provisions of the National Foreign Intelligence Program, which restricts the processing and 
analysis of information on U.S. citizens. For further details regarding this provision, see 
CRS Report RL31805, Authorization and Appropriations for FY2004: Defense, by Amy 
Belasco and Stephen Daggett. 

15 DARPA "is the central research and development organization for the Department of 
Defense (DOD)" that engages in basic and applied research, with a specific focus on 
"research and technology where risk and payoff are both very high and where success may 
provide dramatic advances for traditional military roles and missions." 
[http://www.darpa.mil/] 

16 Department of Defense. May 20, 2003. Report to Congress Regarding the Terrorism 
Information Awareness Program, Executive Summary, p. 2. 

17 Department of Defense. May 20, 2003. Report to Congress Regarding the Terrorism 
Information Awareness Program, Detailed Information, p. 1 (emphasis added). 
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... language translation, data search with pattern recognition and privacy 
protection, and advanced collaborative and decision support tools. Language 
translation technology would enable the rapid analysis of foreign languages, both 
spoken and written, and allow analysts to quickly search the translated materials 
for clues about emerging threats. The data search, pattern recognition, and 
privacy protection technologies would permit analysts to search vast quantities 
of data for patterns that suggest terrorist activity while at the same time 
controlling access to the data, enforcing laws and policies, and ensuring detection 
of misuse of the information obtained. The collaborative reasoning and decision 
support technologies would allow analysts from different agencies to share 
data. 18 

Each part had the potential to improve the data mining capabilities of agencies 
that adopt the technology. 19 Automated rapid language translation could allow 
analysts to search and monitor foreign language documents and transmissions more 
quickly than currently possible. Improved search and pattern recognition 
technologies may enable more comprehensive and thorough mining of transactional 
data, such as passport and visa applications, car rentals, driver license renewals, 
criminal records, and airline ticket purchases. Improved collaboration and decision 
support tools might facilitate the search and coordination activities being conducted 
by different agencies and levels of government. 20 

In public statements DARPA frequently referred to the TIA program as a 
research and development project designed to create experimental prototype tools, 
and that the research agency would only use "data that is legally available and 
obtainable by the U.S. Government." 21 DARPA further emphasized that these tools 
could be adopted and used by other agencies, and that DARPA itself would not be 
engaging in any actual-use data mining applications, although it could "support 
production of a scalable leave-behind system prototype." 22 In addition, some of the 
technology projects being carried out in association with the TIA program did not 
involve data mining. 23 However, the TIA program's overall emphasis on collecting, 



18 Department of Defense, Office of the Inspector General. December 12, 2003. 
Information Technology Management: Terrorism Information A wareness Project ( D2004- 
033). p. 7. 

19 It is important to note that while DARPA' s mission is to conduct research and 
development on technologies that can be used to address national-level problems, it would 
not be responsible for the operation of TIA, if it were to be adopted. 

20 For more details about the Terrorism Information Awareness program and related 
information and privacy laws, see CRS Report RL31730, Privacy: Total Information 
Awareness Programs and Related Information Access, Collection, and Protection Laws, by 
Gina Marie Stevens, and CRS Report RL31786, Total Information Awareness Programs: 
Funding, Composition, and Oversight Issues, by Amy Belasco. 

21 Department of Defense, DARPA, "Defense Advanced Research Project Agency's 
Information Awareness Office and Total Information Awareness Project," p. 1, 
thttp://www.iwar.org.uk/news-archive/tia/iaotia.pdf]. 

22 Ibid., p. 2. 

23 Although most of the TIA -related projects did involve some form of data collection, the 

(continued...) 



CRS-7 



tracking, and analyzing data trails left by individuals served to generate significant 
and vocal opposition soon after John Poindexter made a presentation on TIA at the 
DARPATech 2002 Conference in August 2002. 24 

Critics of the TIA program were further incensed by two administrative aspects 
of the project. The first involved the Director of IAO, Dr. John M. Poindexter. 
Poindexter, a retired Admiral, was, until that time, perhaps most well-known for his 
alleged role in the Iran-contra scandal during the Reagan Administration. His 
involvement with the program caused many in the civil liberties community to 
question the true motives behind TIA. 25 The second source of contention involved 
TIA's original logo, which depicted an "all-seeing" eye atop of a pyramid looking 
down over the globe, accompanied by the Latin phrase scientia est potentia 
(knowledge is power). 26 Although DARPA eventually removed the logo from its 
website, it left a lasting impression. 

The continued negative publicity surrounding the TIA program contributed to 
the introduction of a number of bills in Congress that eventually led to the program' s 
dissolution. Among these bills was S. 188, the Data- Mining Moratorium Act of 
2003, which, if passed, would have imposed a moratorium on the implementation of 
data mining under the TIA program by the Department of Defense, as well as any 
similar program by the Department of Homeland Security. An amendment included 
in the Omnibus Appropriations Act for Fiscal Year 2003 (P.L. 108-7) required the 
Director of Central Intelligence, the Secretary of Defense, and the Attorney General 
to submit a joint report to Congress within 90 days providing details about the TIA 
program. 27 Funding for TIA as a whole was prohibited with the passage of the 
FY2004 Department of Defense Appropriations Act (P.L. 108-87) in September 
2003. However, Section 8131 of the law allowed unspecified subcomponents of the 
TLA initiative to be funded as part of DOD's classified budget, subject to the 



23 (...continued) 

primary purposes of some of these projects, such as war gaming, language translation, and 
biological agent detection, were less connected to data mining activities. For a description 
of these projects, see [http://www.fas.org/irp/agency/dod/poindexter.html]. 

24 The text of Poindexter' s presentation is available at [http://www.darpa.mil/ 
DARPATech2002/presentations/iao_pdf/speeches/POINDEXT.pdf] . The slide presentation 
of Poindexter' s presentation is available at [http://www.darpa.mil/DARPATech2002/ 
presentations/iao_pdf/slides/PoindexterIAO.pdf]. 

25 Shane Harris, "Counterterrorism Project Assailed By Lawmakers, Privacy Advocates," 
Government Executive Magazine, 25 November 2002, [http://www.govexec.com/ 
dailyfed/1 102/1 12502hl.htm]. 

26 The original logo can be found at [http://www.thememoryhole.org/policestate/iao- 
logo.htm]. 

27 The report is available at [http://www.eff.org/Privacy/TIA/TIA-report.pdf]. Some of the 
information required includes spending schedules, likely effectiveness of the program, likely 
impact on privacy and civil liberties, and any laws and regulations that may need to be 
changed to fully deploy TIA. If the report was not submitted within 90 days, funding for the 
TIA program could have been discontinued. For more details regarding this amendment, 
see CRS Report RL3 1786, Total Information Awareness Programs: Funding, Composition, 
and Oversight Issues, by Amy Belasco. 
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provisions of the National Foreign Intelligence Program, which restricts the 
processing and analysis of information on U.S. citizens. 28 

Computer-Assisted Passenger Prescreening System 
(CAPPS II) 

Similar to TIA, the CAPPS II project represented a direct response to the 
September 11, 2001, terrorist attacks. With the images of airliners flying into 
buildings fresh in people's minds, air travel was now widely viewed not only as a 
critically vulnerable terrorist target, but also as a weapon for inflicting larger harm. 
The CAPPS n initiative was intended to replace the original CAPPS, currently being 
used. Spurred, in part, by the growing number of airplane bombings, the existing 
CAPPS (originally called CAPS) was developed through a grant provided by the 
Federal Aviation Administration (FAA) to Northwest Airlines, with a prototype 
system tested in 1996. In 1997, other major carriers also began work on screening 
systems, and, by 1998, most of the U.S. -based airlines had voluntarily implemented 
CAPS, with the remaining few working toward implementation. 29 Also, during this 
time, the White House Commission on Aviation Safety and Security (sometimes 
referred to as the Gore Commission) released its final report in February 1997. 30 
Included in the commission's report was a recommendation that the United States 
implement automated passenger profiling for its airports. 31 On April 19, 1999, the 
FAA issued a notice of proposed rulemaking (NPRM) regarding the security of 
checked baggage on flights within the United States (docket no. FAA-1999-5536). 
32 As part of this still-pending rule, domestic flights would be required to utilize "the 
FAA-approved computer-assisted passenger screening (CAPS) system to select 
passengers whose checked baggage must be subjected to additional security 
measures." 33 

The current CAPPS system is a rule-based system that uses the information 
provided by the passenger when purchasing the ticket to determine if the passenger 
fits into one of two categories; "selectees" requiring additional security screening, 
and those who do not. CAPPS also compares the passenger name to those on a list 
of known or suspected terrorists. 34 CAPPS II was described by TSA as "an enhanced 



28 For further details regarding this provision, see CRS Report RL3 1 805 Authorization and 
Appropriations for FY2004: Defense, by Amy Belasco and Stephen Daggett. 

29 Department of Transportation, White House Commission on Aviation and Security: The 
DOT Status Report, February 1998, [http://www.dot.gov/affairs/whcoasas.htm]. 

30 The Gore Commission was established by Executive Order 13015 on August 22, 1996, 
following the crash of TWA flight 800 in July 1996. 

31 White House Commission on Aviation Safety and Security: Final Report to President 
Clinton. February 12, 1997. [http://www.fas.org/irp/threat/212fin~l.html]. 

32 The docket can be found online at [http://dms.dot.gov/search/document.cfm 
?documentid=57279&docketid=5536]. 

33 Federal Register, 64 (April 19,1999): 19220. 

34 U.S. General Accounting Office, Aviation Security: Computer-Assisted Passenger 

(continued...) 
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system to confirm the identities of passengers and to identify foreign terrorists or 
persons with terrorist connections before they can board U.S. aircraft." 35 CAPPS II 
would have sent information provided by the passenger in the passengers name 
record (PNR), including full name, address, phone number, and date of birth, to 
commercial data providers for comparison to authenticate the identity of the 
passenger. The commercial data provider would have then transmitted a numerical 
score back to TSA indicating a particular risk level. 36 Passengers with a "green" 
score would have undergone "normal screening," while passengers with a "yellow" 
score would have undergone additional screening. Passengers with a "red" score 
would not have been allowed to board the flight, and would have received "the 
attention of law enforcement." 37 While drawing on information from commercial 
databases, TSA had stated that it would not see the actual information used to 
calculate the scores, and that it would not retain the traveler's information. 

TSA had planned to test the system at selected airports during spring 2004. 38 
However, CAPPS II encountered a number of obstacles to implementation. One 
obstacle involved obtaining the required data to test the system. Several high-profile 
debacles resulting in class-action lawsuits have made the U.S. -based airlines very 
wary of voluntarily providing passenger information. In early 2003, Delta Airlines 
was to begin testing CAPPS II using its customers' passenger data at three airports 
across the country. However, Delta became the target of a vociferous boycott 
campaign, raising further concerns about CAPPS II generally. 39 In September 2003, 
it was revealed that JetBlue shared private passenger information in September 2002 
with Torch Concepts, a defense contractor, which was testing a data mining 
application for the U.S. Army. The information shared reportedly included 
itineraries, names, addresses, and phone numbers for 1.5 million passengers. 40 In 
January 2004, it was reported that Northwest Airlines provided personal information 
on millions of its passengers to the National Aeronautics and Space Administration 
(NASA) from October to December 2001 for an airline security-related data mining 



34 (...continued) 

Prescreening System Faces Significant Implementation Challenges, GAO Report GAO-04- 
385, February 2004, pp. 5-6. 

35 Transportation Security Administration, "TSA's CAPPS II Gives Equal Weight to 
Privacy, Security," Press Release, March 11, 2003, [http://www.tsa.gov/public/display 
?theme=44&content=535]. 

36 Robert O'Harrow, Jr., "Aviation ID System Stirs Doubt," Washington Post, 14 March 
2003, p. A16. 

37 Transportation Security Administration, "TSA's CAPPS II Gives Equal Weight to 
Privacy, Security," Press Release, March 11, 2003, [http://www.tsa.gov/public/ 
display ?theme=44&content=5 35]. 

38 Sara Kehaulani Goo, "U.S. to Push Airlines for Passenger Records," Washington Post, 
January 12, 2004, p. Al. 

39 The Boycott Delta website is available at [http://www.boycottdelta.org]. 

40 Don Phillips, "JetBlue Apologizes for Use of Passenger Records," The Washington Post, 
20 September 2003, p. El ; Sara Kehaulani Goo, "TSA Helped JetBlue Share Data, Report 
Says," Washington Post, February 21, 2004, p. El. 
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experiment. In April 2004, it was revealed that American Airlines agreed to 
provide private passenger data on 1.2 million of its customers to TSA in June 2002, 
although the information was sent instead to four companies competing to win a 
contract with TSA. 42 Further instances of data being provided for the purpose of 
testing CAPPS JJ were brought to light during a Senate Committee on Government 
Affairs confirmation hearing on June 23, 2004. In his answers to the committee, the 
acting director of TSA, David M. Stone, stated that during 2002 and 2003 four 
airlines; Delta, Continental, America West, and Frontier, and two travel reservation 
companies; Galileo International and Sabre Holdings, provided passenger records to 
TSA and/or its contractors. 43 

Concerns about privacy protections had also dissuaded the European Union 
(EU) from providing any data to TSA to test CAPPS H However, in May 2004, the 
EU signed an agreement with the United States that would have allowed PNR data 
for flights originating from the EU to be used in testing CAPPS U, but only after TSA 
was authorized to use domestic data as well. As part of the agreement, the EU data 
was to be retained for only three-and-a-half years (unless it is part of a law 
enforcement action), only 34 of the 39 elements of the PNR were to be accessed by 
authorities, 44 and there were to be yearly joint DHS-EU reviews of the 
implementation of the agreement. 45 

Another obstacle was the perception of mission creep. CAPPS II was originally 
intended to just screen for high-risk passengers who may pose a threat to safe air 
travel. However, in an August 1, 2003, Federal Register notice, TSA stated that 
CAPPS II could also be used to identify individuals with outstanding state or federal 
arrest warrants, as well as identify both foreign and domestic terrorists (not just 
foreign terrorists). The notice also states that CAPPS U could be "linked with the 
U.S. Visitor and Immigrant Status Indicator Technology (US-VISIT) program" to 
identify individuals who are in the country illegally (e.g., individuals with expired 
visas, illegal aliens, etc.). 46 In response to critics who cited these possible uses as 



41 Sara Kehaulani Goo, "Northwest Gave U.S. Data on Passengers," Washington Post, 
January 18, 2004, p. Al. 

42 Sara Kehaulani Goo, "American Airlines Revealed Passenger Data," Washington Post, 
April 10, 2004, p. D12. 

43 For the written responses to the committee's questions, see [http://www.epic.org/privacy/ 
airtravel/stone_answers.pdfj; Sara Kehaulani Goo, "Agency Got More Airline 
Records," Washington Post, June 24, 2004, p. A 16. 

44 Some information, such as meal preferences, which could be used to infer religious 
affiliation, and health considerations will not be made available. Goo, Sara Kehaulani, 
"U.S., EU Will Share Passenger Records," Washington Post, May 29, 2004, p. A2. 

45 Department of Homeland Security, "Fact Sheet: US-EU Passenger Name Record 
Agreement Signed," May 28, 2004, [http://www.dhs.gov/dhspublic/display ?content=3651]. 

46 Federal Register. Vol.68 No. 148. August 1, 2003. p. 45266; U.S. General Accounting 
Office, Aviation Security: Challenges Delay Implementation of Computer-Assisted 
Passenger Prescreening System, GAO Testimony GAO-04-504T, March 17, 2004, p. 17 
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examples of mission creep, TSA claimed that the suggested uses were consistent with 
the goals of improving aviation security. 47 

Several other concerns had also been raised, including the length of time 
passenger information was to be retained, who would have access to the information, 
the accuracy of the commercial data being used to authenticate a passenger' s identity, 
the creation of procedures to allow passengers the opportunity to correct data errors 
in their records, and the ability of the system to detect attempts by individuals to use 
identity theft to board a plane undetected. 

Secure Flight. In August 2004, TSA announced that the CAPPS II program 
was being canceled and would be replaced with a new system called Secure Flight. 
In the Department of Homeland Security Appropriations Act, 2005 (P.L. 108-334), 
Congress included a provision (Sec. 522) prohibiting the use of appropriated funds 
for "deployment or implementation, on other than a test basis," of CAPPS II, Secure 
Flight, "or other follow on/successor programs," until GAO has certified that such 
a system has met all of the privacy requirements enumerated in a February 2004 
GAO report, 48 can accommodate any unique air transportation needs as it relates to 
interstate transportation, and that "appropriate life-cycle cost estimates, and 
expenditure and program plans exist." GAO' s certification report 49 was delivered to 
Congress in March 2005. In its report, GAO found that while "TSA is making 
progress in addressing key areas of congressional interest ... TSA has not yet 
completed these efforts or fully addressed these areas, due largely to the current stage 
of the program's development." 50 In follow-up reports in February 2006 51 and June 
2006, 52 GAO reiterated that while TSA continued to make progress, the Secure Flight 
program still suffered from systems development and program management 



47 U.S. General Accounting Office, Aviation Security: Challenges Delay Implementation 
of Computer-Assisted Passenger Prescreening System, GAO Testimony GAO-04-504T, 
March 17, 2004, p. 17. 

48 The eight issues included establishing an oversight board, ensuring the accuracy of the 
data used, conducting stress testing, instituting abuse prevention practices, preventing 
unauthorized access, establishing clear policies for the operation and use of the system, 
satisfying privacy concerns, and created a redress process. U.S. General Accounting Office, 
Aviation Security: Computer-Assisted Passenger Prescreening System Faces Significant 
Implementation Challenges, GAO Report GAO-04-385, February 2004. 

49 U.S. Government Accountability Office, Aviation Security: Secure Flight Development 
and Testing Under Way, but Risks Should Be Managed as System is Further Developed, 
GAO Report GAO-05-356, March 2005. 

50 Ibid., p. 4; for a more detailed analysis of the Secure Flight program, see CRS Report 
RL32802, Homeland Security: Air Passenger Screening and Counterterrorism, by Bart 
Elias and William Krouse. 

51 U.S. General Accountability Office, Aviation Security: Significant Management 
Challenges May Adversely Affect the Implementation of the Transportation Security 
Administration 's Secure Flight Program, GAO Testimony GAO-06-374T. 

52 U.S. General Accountability Office, Aviation Security: Management Challenges Remain 
for the Transportation Security Administration 's Secure Flight Program, GAO Testimony 
GAO-06-864T. 
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problems, preventing it from meeting its congressionally mandated privacy 
requirements. In early 2006 TSA suspended development of Secure Flight in order 
to "rebaseline" or reassess the program. 

In December 2006, the DHS Privacy Office released a report comparing TSA's 
published privacy notices with its actual practices regarding Secure Flight. The DHS 
Privacy Office found that there were discrepancies related to data testing and 
retention, due in part because the privacy notices "were drafted before the testing 
program had been designed fully." However, the report also points out that 

material changes in a federal program's design that have an impact on the 
collection, use, and maintenance of personally identifiable information of 
American citizens are required to be announced in Privacy Act system notices 
and privacy impact assessments. 53 

In a February 2007 interview, it was reported that TSA Administrator Kip 
Hawley stated that while TSA has developed a means to improve the accuracy, 
privacy, and reliability of Secure Flight, it would take approximately one-and-a-half 
years to complete. This would be followed by an additional year of testing, leading 
to an anticipated implementation in 2010. 54 

On August 23, 2007, TSA published a notice of proposed rulemaking (NPRM) 
for implementing Secure Flight, as well as an NPRM proposing Privacy Act 
exemptions for Secure Flight, 55 in the Federal Register. A Privacy Act System of 
Records Notice (SORN) 56 was also published in the same edition of the Federal 
Register. In addition, a Privacy Impact Assessment (PIA) for Secure Flight was 
posted on the TSA website. 57 

Along with the Secure Flight NPRM, on August 23, 2007, TSA published a 
related but separate final rule regarding the Advance Passenger Information System 
(APIS) administered by U.S. Customs and Border Protection (CBP) for screening 



53 U.S. Department of Homeland Security, Privacy Office, Report to the Public on the 
Transportation Security Administration's Secure Flight Program and Privacy 
Recommendations, December 2006, p. 13, [http://www.dhs.gov/xlibrary/assets/privacy/ 
privacy-secure-flight- 1 22006.pdf] . 

54 Eric Lipton, "U.S. Official Admits to Big Delay in Revamping No-Fly Program," New 
York Times, February 21, 2007, p. A17. 

55 Department of Homeland Security, Transportation Security Administration, "Privacy Act 
of 1974: Implementation of Exemptions; Secure Flight Records," 72 Federal Register 
48397, August 23, 2007. 

56 Department of Homeland Security, Transportation Security Administration, "Privacy Act 
of 1974: System of Records; Secure Flight Records," 72 Federal Register 48392, August 
23, 2007. 

57 See [http://www.tsa.gov/assets/pdf/pia_secureflight.pdfj. 
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passengers of international flights departing from or arriving to the United States. 
TSA states 

We propose that, when the Secure Flight rule becomes final, aircraft operators 
would submit passenger information to DHS through a single DHS portal for 
both the Secure Flight and APIS programs. This would allow DHS to integrate 
the watch list matching component of APIS into Secure Flight, resulting in one 
DHS system responsible for watch list matching for all aviation passengers. 59 

According to the August 23, 2007 Secure Flight NPRM, in accordance with the 
Intelligence Reform and Terrorism Prevention Act (IRTPA), "TSA would receive 
passenger and certain non-traveler information, conduct watch list matching against 
the No Fly and Selectee portions of the Federal Government' s consolidated terrorist 
watch list, and transmit boarding pass printing instructions back to aircraft 
operators." 60 Currently, air carriers are responsible for comparing passenger 
information to that on government watch lists. 

The NPRM states that TSA would collect Secure Flight Passenger Data that 
includes a combination of required and optional information. Passengers would be 
required to provide their full names, "as it appears on a verifying identity document 
held by that individual." 61 In addition, passengers would be asked, but not required, 
to provide their date of birth, gender, Redress Number or known traveler number. 
However, the NPRM does propose circumstances in which aircraft operators would 
be required to provide the optional information to TSA if it already has obtained that 
information "in the ordinary course of business." The NPRM states 

If a covered aircraft operator were to input data required to be requested from 
individuals into the system where it stores SFPD — such as data from a 
passenger profile stored by the aircraft operator in the ordinary course of 
business — the aircraft operator would be required to include that data as part of 
the SFPD transmitted to TSA, even though the individual did not provide that 
information at the time of reservation. 62 

In addition, aircraft operations would be required to provide TSA, if available, a 
passenger's passport information, and "certain non-personally identifiable data 
fields" including itinerary information, reservation control number, record sequence 



58 Department of Homeland Security, Bureau of Customers and Border Protection, "Advance 
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and Vessels," 72 Federal Register 48320, August 23, 2007. 

59 Department of Homeland Security, Transportation Security Administration, "Secure 
Flight Program," 72 Federal Register 48356, August 23, 2007. 

60 Department of Homeland Security, Transportation Security Administration, "Secure 
Flight Program," 72 Federal Register 48356, August 23, 2007. 

61 Ibid., p. 48369. 

62 Ibid., p. 48364. 
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number, record type, passenger update indicator, and traveler reference number. 
Secure Flight would not utilize commercial data to verify identities, nor would it use 
algorithms to assign risk scores to individuals. 64 

In the NPRM TSA proposes a tiered data retention schedule. The purpose for 
retaining the records would be to facilitate a redress process, expedite future travel, 
and investigate and document terrorist events. Under this schedule, the records for 
"individuals not identified as potential matches by the automated matching tool 
would be retained for seven days" after the completion of directional travel. The 
records for individuals identified as "potential matches" would be retained for seven 
years following the completion of directional travel. The records of individuals 
identified as "confirmed matches" would be retained for 99 years. 65 

This original NPRM included a 60-day comment period, ending on October 22, 
2007. However, in response to deadline extension requests received, on October 24, 
2007, TSA published a notice in the Federal Register extending the public comment 
period an additional 30 days, ending November 21, 2007. 66 

On November 9, 2007, TSA published a final SORN 67 and a final rule regarding 
Privacy Act exemptions for Secure Flight. 68 

Multistate Anti-Terrorism Information Exchange (MATRIX) 
Pilot Project 

Similar to TIA and CAPPS n, which were born out of an initial reaction to 
concerns about terrorism, the impetus and initial work on MATRIX grew out of the 
September 11, 2001 terrorist attacks. MATRIX was initially developed by Seisint, 
a Florida-based information products company, in an effort to facilitate collaborative 
information sharing and factual data analysis. At the outset of the project, MATRIX 
included a component Seisint called the High Terrorist Factor (HTF). Within days 
of the terrorist attacks, based on an analysis of information that included "age and 
gender, what they did with their drivers license, either pilots or associations to pilots, 
proximity to 'dirty' addresses/phone numbers, investigational data, how they 
shipped; how they received, social security number anomalies, credit history, and 
ethnicity," Seisint generated a list of 120,000 names with high HTF scores, or so- 
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called terrorism quotients. Seisint provided this list to the Federal Bureau of 
Investigation (FBI), the Immigration and Naturalization Service (INS), the United 
States Secret Service (USSS), and the Florida Department of Law Enforcement 
(FDLE), which, according to a January 2003 presentation, made by the company, led 
to "several arrests within one week" and "scores of other arrests." 69 Although the 
HTF scoring system appeared to attract the interest of officials, this feature was 
reportedly dropped from MATRIX because it relied on intelligence data not normally 
available to the law enforcement community and concerns about privacy abuses. 
However, some critics of MATRIX continued to raise questions about HTF, citing 
the lack of any publicly available official documentation verifying such a decision. 70 

As a pilot project, MATRIX was administered through a collaborative effort 
between Seisint, the FDLE, 71 and the Institute for Intergovernmental Research (UR), 
a "Florida-based nonprofit research and training organization, [that] specializes in 
law enforcement, juvenile justice, and criminal justice issues." 72 The Florida 
Department of Law Enforcement (FDLE) served as the "Security Agent" for 
MATRIX, administering control over which agencies and individuals had access to 
the system. FDLE was also a participant state in MATRIX. UR was responsible for 
administrative support, and was the grantee for federal funds received for 
MATRIX. 73 

The analytical core of the MATRIX pilot project was an application called 
Factual Analysis Criminal Threat Solution (FACTS). FACTS was described as a 
"technological, investigative tool allowing query-based searches of available state 
and public records in the data reference repository." 74 The FACTS application 
allowed an authorized user to search "dynamically combined records from disparate 
datasets" based on partial information, and will "assemble" the results. 75 The data 
reference repository used with FACTS represented the amalgamation of over 3.9 
billion public records collected from thousands of sources. 76 Some of the data 
contained in FACTS included FAA pilot licenses and aircraft ownership records, 
property ownership records, information on vessels registered with the Coast Guard, 
state sexual offenders lists, federal terrorist watch lists, corporation filings, Uniform 



69 A copy of the presentation is available at [http://www.aclu.org/Files/OpenFile.cfm?id= 
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Commercial Code filings, bankruptcy filings, state-issued professional licenses, 
criminal history information, department of corrections information and photo 
images, driver's license information and photo images, motor vehicle registration 
information, and information from commercial sources that "are generally available 
to the public or legally permissible under federal law." 77 The data reference 
repository purportedly excluded data such as telemarketing call lists, direct mail 
mailing lists, airline reservations or travel records, frequent flyer/hotel stay program 
membership or activity, magazine subscriptions, information about purchases made 
at retailers or over the Internet, telephone calling logs or records, credit or debit card 
numbers, mortgage or car payment information, bank account numbers or balance 
information, birth certificates, marriage licenses, divorce decrees, or utility bill 
payment information. 

Participating law enforcement agencies utilized this information sharing and 
data mining resource over the Regional Information Sharing Systems (RISS) secure 
intranet (RISSNET). The RISS Program is an established system of six regional 
centers that are used to "share intelligence and coordinate efforts against criminal 
networks that operate in many locations across jurisdictional lines." 78 The RISS 
Program is used to combat traditional law enforcement targets, such as drug 
trafficking and violent crime, as well as other activities, such as terrorism and 
cybercrime. According to its website, RISS has been in operation for nearly 25 years, 
and has "member agencies in all 50 states, the District of Columbia, U.S. territories, 
Australia, Canada, and England." 79 

Some critics of MATRIX suggested that the original intentions and design of 
the pilot project echoed those of DARPA's highly criticized TIA program. 80 
However, while it is difficult to ascribe intention, an ongoing series of problems did 
appear to have affected the trajectory of the project. In August 2003, Hank Asher, 
the founder of Seisint, resigned from the company' s board of directors after questions 
about his criminal history were raised during contract negotiations between Seisint 
and the Florida Department of Law Enforcement. In the 1980s, Asher was allegedly 
a pilot in several drug smuggling cases. However, he was reportedly never charged 
in the cases in exchange for his testimony at state and federal trials. Similar concerns 
had surfaced in 1999 when the FBI and the U.S. Drug Enforcement Agency (DEA) 
reportedly cancelled contracts with an earlier company Asher founded, DBT Online, 
Inc. 81 
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Some civil liberties organizations also raised concerns about law enforcement 
actions being taken based on algorithms and analytical criteria developed by a private 
corporation, in this case Seisint, without any public or legislative input. 82 Questions 
also were raised about the level of involvement of the federal government, 
particularly the Department of Homeland Security and the Department of Justice, in 
a project that is ostensibly focused on supporting state-based information sharing. 83 
It has been reported that the MATRIX pilot project has received a total of $ 1 2 million 
in federal funding — $8 million from the Office of Domestic Preparedness (ODP) 
at the Department of Homeland Security (DHS), and $4 million from the Bureau of 
Justice Assistance (BJA) at the Department of Justice (DOJ). 84 

The MATRIX pilot project also suffered some setbacks in recruiting states to 
participate. The lack of participation can be especially troubling for a networked 
information sharing project such as MATRIX because, as Metcalfe's Law suggests, 
"the power of the network increases exponentially by the number of computers 
connected to it." 85 While as many as 16 states were reported to have either 
participated or seriously considered participating in MATRIX, several chose to 
withdraw, leaving a total of four states (Connecticut, Florida, Ohio, and 
Pennsylvania) at the conclusion of the pilot on April 15, 2005. State officials cited 
a variety of reasons for not participating in MATRIX, including costs, concerns about 
violating state privacy laws, and duplication of existing resources. 86 

In its news release announcing the conclusion of the pilot, the FDLE stated that 
as a proof-of-concept pilot study from July 2003 to April 2005, MATRIX had 
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achieved many "operational successes." Among the statistics cited, the news release 
stated that 

• Between July 2003 and April 2005, there have been 1,866,202 
queries to the FACTS application. 

• As of April 8, 2005, there were 963 law enforcement users accessing 
FACTS. 

• FACTS assisted a variety of investigations. On average, cases 
pertained to the following: 

• Fraud — 22.6% 

• Robbery— 18.8% 

• Sex Crime Investigations — 8.6% 

• Larceny and Theft — 8.3% 

• Extortion/Blackmail — 7.0% 

• Burglary/Breaking and Entering — 6.8% 

• Stolen Property — 6.2% 

• Terrorism/National Security — 2.6% 

• Other — 19.1% (e.g., assault, arson, narcotics, homicide) 

It was also announced that while the pilot study would not be continued, due to a lack 
of additional federal funding, that Florida and other participating states were 
"independently negotiating the continued use of the FACTS application for use 
within their individual state[s]." 87 

Other Data Mining Initiatives 

Able Danger. In summer 2005, news reports began to appear regarding a data 
mining initiative that had been carried out by the U.S. Army's Land Information 
Warfare Agency (LIWA) in 1999-2000. The initiative, referred to as Able Danger, 
had reportedly been requested by the U.S. Special Operations Command (SOCOM) 
as part of larger effort to develop a plan to combat transnational terrorism. Because 
the details of Able Danger remain classified, little is known about the program. 
However, in a briefing to reporters, the Department of Defense characterized Able 
Danger as a demonstration project to test analytical methods and technology on very 
large amounts of data. 88 The project involved using link analysis to identify 
underlying connections and associations between individuals who otherwise appear 
to have no outward connection with one another. The link analysis used both 
classified and open source data, totaling a reported 2.5 terabytes. 89 All of this data, 
which included information on U.S. persons, was reportedly deleted in April 2000 
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due to U.S. Army regulations requiring information on U.S. persons be destroyed 
after a project ends or becomes inactive. 90 

Interest in Able Danger was largely driven by controversy over allegations that 
the data mining analysis had resulted in the identification of Mohammed Atta, one 
of the 9/11 hijackers, as a terrorist suspect before the attacks took place. While some 
individuals who had been involved in Able Danger were reportedly prepared to 
testify that they had seen either his name and/or picture on a chart prior to the attacks, 
the identification claim was strongly disputed by others. 

On September 21, 2005, the Senate Committee on the Judiciary held a hearing 
on Able Danger to consider how the data could or should have been shared with other 
agencies, and whether the destruction of the data was in fact required by the relevant 
regulations. While the Department of Defense directed the individuals involved in 
Able Danger not to testify at the hearing, testimony was taken from the attorney of 
one of the individuals, as well as others not directly involved with the project. 

On February 15, 2006, the House Committee on Armed Services Subcommittee 
on Strategic Forces and Subcommittee on Terrorism, Unconventional Threats and 
Capabilities held a joint hearing on Able Danger. The first half of the hearing was 
held in open session while the second half of the hearing was held in closed session 
to allow for the discussion of classified information. Witnesses testifying during the 
open session included Stephen Cambone, Undersecretary of Defense for Intelligence; 
Erik Kleinsmith; Anthony Shaffer, and J.D. Smith. 

In September 2006, a Department of Defense Inspector General report regarding 
Able Danger was released. The investigation examined allegations of 
mismanagement of the Able Danger program and reprisals against Lieutenant 
Colonel (LTC) Anthony Shaffer, a member of the U.S. Army Reserve and civilian 
employee of the Defense Intelligence Agency (DIA). The DoD Inspector General 
"found some procedural oversights concerning the DIA handling of LTC Shaffer's 
office contents and his Officer Evaluation Reports." However, the investigation 
found that 

The evidence did not support assertions that Able Danger identified the 
September 11, 2001, terrorists nearly a year before the attack, that Able Danger 
team members were prohibited from sharing information with law enforcement 
authorities, or that DoD officials reprised against LTC Shaffer for his disclosures 
regarding Able Danger. 91 

In December 2006, the then-Chairman and then- Vice Chairman of the Senate 
Select Committee on Intelligence, Senator Roberts and Senator Rockefeller 
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respectively, released a letter summarizing the findings of a review of Able Danger 
conducted by Committee staff. 92 According to the letter, the results of the review, 
begun in August 2005, "were confirmed in all respects by the DoD Inspector General 
investigation of the Able Danger program (Case Number H05L9790521)." The letter 
further stated that the review "revealed no evidence to support the underlying Able 
Danger allegations" and that the Committee considered the matter "closed." 

Automated Targeting System (ATS). On November 2, 2006, DHS posted 
a System of Records Notice (SORN) in the Federal Register regarding the 
deployment of the Automated Targeting System (ATS), to screen travelers entering 
the United States by car, plane, ship, or rail. 93 Originally developed to help identify 
potential cargo threats, ATS is a module of the Treasure Enforcement 
Communications System (TECS). TECS is described as an "overarching law 
enforcement information collection, targeting, and sharing environment." ATS is run 
by the Bureau of Customs and Border Protection (CPB). The Federal Register notice 
states that "ATS builds a risk assessment for cargo, conveyances, and travelers based 
on criteria and rules developed by CPB." The notice further states that "ATS both 
collects information directly, and derives other information from various systems." 
Information collected may be retained for up to forty years "to cover the potentially 
active lifespan of individuals associated with terrorism or other criminal activities." 

According to a November 22, 2006 privacy impact assessment, ATS itself is 
composed of six modules: 

• ATS-Inbound — inbound cargo and conveyances (rail, truck, ship, 
and air) 

• ATS-Outbound — outbound cargo and conveyances (rail, truck, 
ship, and air) 

• ATS-Passenger (ATS-P) — travelers and conveyances (air, ship, and 
rail) 

• ATS-Land (ATS-L) — private vehicles arriving by land 

• ATS -International (ATS-I) — cargo targeting for CPB's 
collaboration with foreign customs authorities 

• ATS-Trend Analysis and Analytical Selectivity Program ( ATS-TAP) 
(analytical module) 94 

According to DHS, "ATS historically was covered by the SORN for TECS." 
The November 2, 2006 SORN was "solely to provide increased noticed and 
transparency to the public about ATS" and "did not describe any new collection of 



92 A copy of the letter is available at [http://www.intelligence.senate.gov/abledanger.pdf]. 

93 Department of Homeland Security, Office of the Secretary, "Privacy Act of 1974; System 
of Records," 71 Federal Register 64543, November 2, 2006. 

94 Department of Homeland Security, Privacy Impact Assessment for the Automated 
Targeting System, November 22, 2006, p.3, [http://www.dhs.gov/xlibrary/assets/privacy/ 
privacy_pia_cbp_ats.pdf] . 
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information.' However, the disclosure raised a number of issues about various 
facets of the program, including proposed exemptions from the Privacy Act; 
opportunities for citizens to correct errors in the records; how the risk assessments 
are created; if any previous testing has been conducted; and the effectiveness of the 
system. 

In its July 6, 2007 report to Congress, the DHS Privacy Office stated that of the 
six modules that compose ATS, only two — ATS Inbound and ATS Outbound 
(which became operational in 1997) — "engage in data mining to provide decision 
support analysis for targeting of cargo for suspicious activity." 96 In contrast, the DHS 
Privacy Office report states that the ATS Passenger module does not meet the 
definition of data mining referred to in H.Rept. 109-699 (this definition is discussed 
in more detail in "Legislation in the 109 th Congress," below). Whereas the ATS 
Passenger module calls for a search or examination of a traveler based on the 
traveler's personally identifying travel documents, the data mining definition in 
H.Rept. 109-699 only includes a search that "does not use a specific individual's 
personal identifiers to acquire information concerning that individual." 97 

On August 6, 2007, the Privacy Office of the Department of Homeland Security 
published a notice of proposed rulemaking (NPRM) proposing Privacy Act 
exemptions for the Automated Targeting System, 98 in the Federal Register. A 
Privacy Act System of Records Notice (SORN) 99 was also published in the same 
edition of the Federal Register. In addition, a revised Privacy Impact Assessment 
(PIA) for ATS was posted on the DHS website. 100 

According to the NPRM, ATS-P module records exempt from the Privacy Act 
would include "the risk assessment analyses and business confidential information 
received in the PNR from the air and vessel carriers." Records or information 
obtained from other systems of records that are exempt from certain provisions of the 
Privacy Act would retain their exemption in ATS. 101 In the NPRM, DHS states that 



95 Department of Homeland Security, Office of the Secretary, "Privacy Act of 1974; U.S. 
Customs and Border Protection, Automated Targeting System, System of Records," 72 
Federal Register 43650, August 6, 2007. 

96 Department of Homeland Security, Privacy Office, 2007 Data Mining Report: DHS 
Privacy Office Response to House Report 109-699, July 6, 2007, 
[http://www.dhs.gov/xlibrary/assets/privacy/privacy_rpt_datamining_2007.pdfj , p. 17. 

97 Ibid., p. 7 and p. 17, footnote 34. 

98 Department of Homeland Security, Office of the Secretary, "Privacy Act of 1974: 
Implementation of Exemptions; Automated Targeting System," 72 Federal Register 43567, 
August 6, 2007. 

99 Department of Homeland Security, Office of the Secretary, "Privacy Act of 1974; U.S. 
Customs and Border Protection, Automated Targeting System, System of Records," 72 
Federal Register 43650, August 6, 2007. 

100 See [http://www.dhs.gov/xlibrary/assets/privacy/privacy_pia_cbp_atsupdate.pdfj . 

101 Department of Homeland Security, Office of the Secretary, "Privacy Act of 1974: 
Implementation of Exemptions; Automated Targeting System," 72 Federal Register 43567, 
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the exemptions are needed "to protect information relating to law enforcement 
investigations from disclosures to subjects of investigations and others who could 
interfere with investigatory and law enforcement activities." 

The August 6, 2007 SORN is a revised version of the November 2, 2006 SORN 
"which responds to those comments [received in response to the November 2006 
SORN], makes certain amendments with regard to the retention period and access 
provisions of the prior notice, and provides further notice and transparency to the 
public about the functionality of ATS." 102 The changes include 

• Reducing the "general retention period for data maintained in ATS" 
from 40 to 15 years, and adding a requirement that users obtain 
supervisory approval to access archived data in the last eight years 
of the retention period. 

• Allowing "persons whose PNR data has been collected and 
maintained in ATS-P [to] have administrative access to that data 
under the Privacy Act." Individuals will also be able to "seek to 
correct factual inaccuracies contained in their PNR data, as it is 
maintained by CBP." 

• Adding book agents as a category of people from whom information 
is obtained, in acknowledgment that book agents' identities are 
included in itinerary information. 

• Amending the categories of people covered by ATS "to include 
persons who international itineraries cause their flight to stop in the 
United States, either to refuel or permit a transfer, and crewmembers 
on flights that overly or transit through U.S. airspace." 

• Clarifying "the categories of PNR data collected and maintained in 
ATS-P to more accurately reflect the type of data collected from air 
carriers." 

• Removing "two of the routine uses included in the earlier version of 
the SORN — those pertaining to using ATS in background checks." 

This revised SORN became effective on September 5, 2007. 

National Security Agency (NSA) and the Terrorist Surveillance 
Program. In December 2005 news reports appeared for the first time revealing the 
existence of a classified NSA terrorist surveillance program, dating back to at least 
2002, involving the domestic collection, analysis, and sharing of telephone call 
information. 103 Controversy over the program raised congressional concerns about 
both the prevalence of homeland security data mining and the capacity of the 



1 (...continued) 
August 6, 2007. 

102 Department of Homeland Security, Office of the Secretary, "Privacy Act of 1974; U.S. 
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103 Peter Baker, "President Says He Ordered NSA Domestic Spying," The Washington Post, 
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country' s intelligence and law enforcement agencies to adequately analyze and share 
counterterrorism information. The Senate Committee on the Judiciary held two 
hearings regarding the issue on February 6 and February 28, 2006. 

Although details about the program are classified, statements by President Bush 
and Administration officials following the initial revelation of the program suggested 
that the NSA terrorist surveillance program focused only on international calls, with 
a specific goal of targeting the communications of al Qaeda and related terrorist 
groups, and affiliated individuals. It was also suggested that the program was 
reviewed and reauthorized on a regular basis and that key Members of Congress had 
been briefed about the program. 

In his weekly radio address on December 17, 2005, President Bush stated: 

In the weeks following the terrorist attacks on our nation, I authorized the 
National Security Agency, consistent with U.S. law and the Constitution, to 
intercept the international communications of people with known links to al 
Qaeda and related terrorist organizations. Before we intercept these 
communications, the government must have information that establishes a clear 
link to these terrorist networks. 104 

President Bush also stated during his radio address: 

The activities I authorized are reviewed approximately every 45 days. Each 
review is based on a fresh intelligence assessment of terrorist threats to the 
continuity of our government and the threat of catastrophic damage to our 
homeland. During each assessment, previous activities under the authorization 
are reviewed. The review includes approval by our nation's top legal officials, 
including the Attorney General and the Counsel to the President. I have 
reauthorized this program more than 30 times since the September the 11th 
attacks, and I intend to do so for as long as our nation faces a continuing threat 
from al Qaeda and related groups. 105 

In a January 27, 2006, public release statement, the Department of Justice stated: 

The NSA program is narrowly focused, aimed only at international calls and 
targeted at al Qaeda and related groups. Safeguards are in place to protect the 
civil liberties of ordinary Americans. 

• The program only applies to communications where one party is 
located outside of the United States. 

• The NSA terrorist surveillance program described by the President 
is only focused on members of Al Qaeda and affiliated groups. 
Communications are only intercepted if there is a reasonable basis 
to believe that one party to the communication is a member of al 
Qaeda, affiliated with al Qaeda, or a member of an organization 
affiliated with al Qaeda. 



104 President George W. Bush, "President's Radio Address," December 17, 2005, 
[http ://www . whitehouse. gov/news/releases/2005/ 1 2/2005 1217. html] . 
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• The program is designed to target a key tactic of al Qaeda: 
infiltrating foreign agents into the United States and controlling 
their movements through electronic communications, just as it did 
leading up to the September 1 1 attacks. 

• The NSA activities are reviewed and reauthorized approximately 
every 45 days. In addition, the General Counsel and Inspector 
General of the NSA monitor the program to ensure that it is 
operating properly and that civil liberties are protected, and the 
intelligence agents involved receive extensive training. 106 

On February 6, 2006, in his written statement for a Senate Committee on the 
Judiciary hearing, U.S. Attorney General Gonzalez stated: 

The terrorist surveillance program targets communications where one party to the 
communication is outside the U.S. and the government has "reasonable grounds 
to believe" that at least one party to the communication is a member or agent of 
al Qaeda, or an affiliated terrorist organization. This program is reviewed and 
reauthorized by the President approximately every 45 days. The Congressional 
leadership, including the leaders of the Intelligence Committees of both Houses 
of Congress, has been briefed about this program more than a dozen times since 
2001 . The program provides the United States with the early warning system we 
so desperately needed on September 10th. 107 

In May 2006 news reports alleged additional details regarding the NSA terrorist 
surveillance program, renewing concerns about the possible existence of 
inappropriately authorized domestic surveillance. According to these reports, 
following the September 1 1, 2001 attacks, the NSA contracted with AT&T, Verizon, 
and BellSouth to collect information about domestic telephone calls handled by these 
companies. The NSA, in turn, reportedly used this information to conduct "social 
network analysis" to map relationships between people based on their 
communications. 108 

It remains unclear precisely what information, if any, was collected and 
provided to the NSA. Some reports suggest that personally identifiable information 
(i.e., names, addresses, etc.) were not included. It also has been reported that the 
content of the calls (what was spoken) was not collected. Since the emergence of 
these news reports, BellSouth has issued a public statement saying that according to 
an internal review conducted by the company, "no such [alleged] contract exists" and 
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that the company has "not provided bulk customer calling records to the NSA. 
Similarly, Verizon has issued a public statement saying that due to the classified 
nature of the NSA program, "Verizon cannot and will not confirm or deny whether 
it has any relationship to the classified NSA program," but that "Verizon's wireless 
and wireline companies did not provide to NSA customer records or call data, local 
or otherwise." 110 Together, AT&T, Verizon, and BellSouth are the three largest 
telecommunications companies in the United States, serving more than 200 million 
customers, accounting for hundreds of billions of calls each year. 111 

In a January 17, 2007 letter to the Senate Committee on the Judiciary, then- 
Attorney General Gonzalez wrote that: 

a Judge of the Foreign Intelligence Surveillance Court issued orders authorizing 
the Government to target for collection international communications into or out 
of the United States where there is probable cause to believe that one of the 
communicants is a member or agent of al Qaeda or an associated terrorist 
organization. As a result of these orders, any electronic surveillance that was 
occurring as part of the Terrorist Surveillance Program will now be conducted 
subject to the approval of the Foreign Intelligence Surveillance Court. 112 

The letter further stated that "the President has determined not to reauthorize the 
Terrorist Surveillance Program when the current authorization expires." 

The program and the alleged involvement of telecommunications companies 
continues to be the subject of several lawsuits. For a discussion of these legal issues, 
see CRS Report RL33424, Government Access to Phone Calling Activity and Related 
Records: Legal Authorities, by Elizabeth B. Bazan, Gina Marie Stevens, and Brian 
T. Yeh. Congress is also considering an option to provide a measure of retroactive 
immunity to telecommunications companies that participated in government 
counterterrorism surveillance activities between September 1 1, 2001, and January 17, 
2007, as a provision of S. 2248, the Foreign Intelligence Surveillance Act of 1978 
Amendments Act of 2007. For a discussion of this legislation, see CRS Report 
RL33539, Intelligence Issues for Congress, by Richard A. Best, Jr. 

Novel Intelligence from Massive Data (NIDM) Program. As part of its 
efforts to better utilize the overwhelming flow of information it collects, NSA has 
reportedly been supporting the development of new technology and data management 
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techniques by funding grants given by the Advanced Research Development Activity 
(ARDA). ARDA is an intelligence community (IC) organization whose mission is 
described as "to sponsor high-risk, high-payoff research designed to leverage leading 
edge technology to solve some of the most critical problems facing the Intelligence 
Community (IC)." 113 ARDA' s research support is organized into various technology 
"thrusts" representing the most critical areas of development. Some of ARDA's 
research thrusts include Information Exploitation, Quantum Information Science, 
Global Infosystems Access, Novel Intelligence from Massive Data, and Advanced 
Information Assurance. 

The Novel Intelligence from Massive Data (NHVID) program focuses on the 
development of data mining and analysis tools to be used in working with massive 
data. 1 14 Novel intelligence refers to "actionable information not previously known." 
Massive data refers to data that has characteristics that are especially challenging to 
common data analysis tools and methods. These characteristics can include unusual 
volume, breadth (heterogeneity), and complexity. Data sets that are one petabyte 
(one quadrillion bytes) or larger are considered to be "massive." Smaller data sets 
that contain items in a wide variety of formats, or are very heterogeneous (i.e., 
unstructured text, spoken text, audio, video, graphs, diagrams, images, maps, 
equations, chemical formulas, tables, etc.) can also be considered "massive." 
According to ARDA's website (no longer available) 115 "some intelligence data 
sources grow at a rate of four petabytes per month now, and the rate of growth is 
increasing." With the continued proliferation of both the means and volume of 
electronic communications, it is expected that the need for more sophisticated tools 
will intensify. Whereas some observers once predicted that the NSA was in danger 
of becoming proverbially deaf due to the spreading use of encrypted 
communications, it appears that NSA may now be at greater risk of being "drowned" 
in information. 



Data Mining Issues 

As data mining initiatives continue to evolve, there are several issues Congress 
may decide to consider related to implementation and oversight. These issues 
include, but are not limited to, data quality, interoperability, mission creep, and 
privacy. As with other aspects of data mining, while technological capabilities are 
important, other factors also influence the success of a project's outcome. 
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Data Quality 

Data quality is a multifaceted issue that represents one of the biggest challenges 
for data mining. Data quality refers to the accuracy and completeness of the data. 
Data quality can also be affected by the structure and consistency of the data being 
analyzed. The presence of duplicate records, the lack of data standards, the 
timeliness of updates, and human error can significantly impact the effectiveness of 
the more complex data mining techniques, which are sensitive to subtle differences 
that may exist in the data. To improve data quality, it is sometimes necessary to 
"clean" the data, which can involve the removal of duplicate records, normalizing the 
values used to represent information in the database (e.g., ensuring that "no" is 
represented as a throughout the database, and not sometimes as a 0, sometimes as 
an N, etc.), accounting for missing data points, removing unneeded data fields, 
identifying anomalous data points (e.g., an individual whose age is shown as 142 
years), and standardizing data formats (e.g., changing dates so they all include 
MM/DD/YYYY). 

Interoperability 

Related to data quality, is the issue of interoperability of different databases and 
data mining software. Interoperability refers to the ability of a computer system 
and/or data to work with other systems or data using common standards or processes. 
Interoperability is a critical part of the larger efforts to improve interagency 
collaboration and information sharing through e-government and homeland security 
initiatives. For data mining, interoperability of databases and software is important 
to enable the search and analysis of multiple databases simultaneously, and to help 
ensure the compatibility of data mining activities of different agencies. Data mining 
projects that are trying to take advantage of existing legacy databases or that are 
initiating first-time collaborative efforts with other agencies or levels of government 
(e.g., police departments in different states) may experience interoperability 
problems. Similarly, as agencies move forward with the creation of new databases 
and information sharing efforts, they will need to address interoperability issues 
during their planning stages to better ensure the effectiveness of their data mining 
projects. 

Mission Creep 

Mission creep is one of the leading risks of data mining cited by civil 
libertarians, and represents how control over one's information can be a tenuous 
proposition. Mission creep refers to the use of data for purposes other than that for 
which the data was originally collected. This can occur regardless of whether the 
data was provided voluntarily by the individual or was collected through other 
means. 

Efforts to fight terrorism can, at times, take on an acute sense of urgency. This 
urgency can create pressure on both data holders and officials who access the data. 
To leave an available resource unused may appear to some as being negligent. Data 
holders may feel obligated to make any information available that could be used to 
prevent a future attack or track a known terrorist. Similarly, government officials 
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responsible for ensuring the safety of others may be pressured to use and/or combine 
existing databases to identify potential threats. Unlike physical searches, or the 
detention of individuals, accessing information for purposes other than originally 
intended may appear to be a victimless or harmless exercise. However, such 
information use can lead to unintended outcomes and produce misleading results. 

One of the primary reasons for misleading results is inaccurate data. All data 
collection efforts suffer accuracy concerns to some degree. Ensuring the accuracy of 
information can require costly protocols that may not be cost effective if the data is 
not of inherently high economic value. In well-managed data mining projects, the 
original data collecting organization is likely to be aware of the data' s limitations and 
account for these limitations accordingly. However, such awareness may not be 
communicated or heeded when data is used for other purposes. For example, the 
accuracy of information collected through a shopper' s club card may suffer for a 
variety of reasons, including the lack of identity authentication when a card is issued, 
cashiers using their own cards for customers who do not have one, and/or customers 
who use multiple cards. 116 For the purposes of marketing to consumers, the impact 
of these inaccuracies is negligible to the individual. If a government agency were to 
use that information to target individuals based on food purchases associated with 
particular religious observances though, an outcome based on inaccurate information 
could be, at the least, a waste of resources by the government agency, and an 
unpleasant experience for the misidentified individual. As the March 2004 TAP AC 
report observes, the potential wide reuse of data suggests that concerns about mission 
creep can extend beyond privacy to the protection of civil rights in the event that 
information is used for "targeting an individual solely on the basis of religion or 
expression, or using information in a way that would violate the constitutional 
guarantee against self-incrimination." 117 

Privacy 

As additional information sharing and data mining initiatives have been 
announced, increased attention has focused on the implications for privacy. 
Concerns about privacy focus both on actual projects proposed, as well as concerns 
about the potential for data mining applications to be expanded beyond their original 
purposes (mission creep). For example, some experts suggest that anti-terrorism data 
mining applications might also be useful for combating other types of crime as 
well. 118 So far there has been little consensus about how data mining should be 
carried out, with several competing points of view being debated. Some observers 
contend that tradeoffs may need to be made regarding privacy to ensure security. 
Other observers suggest that existing laws and regulations regarding privacy 
protections are adequate, and that these initiatives do not pose any threats to privacy. 
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Still other observers argue that not enough is known about how data mining projects 
will be carried out, and that greater oversight is needed. There is also some 
disagreement over how privacy concerns should be addressed. Some observers 
suggest that technical solutions are adequate. In contrast, some privacy advocates 
argue in favor of creating clearer policies and exercising stronger oversight. As data 
mining efforts move forward, Congress may consider a variety of questions 
including, the degree to which government agencies should use and mix commercial 
data with government data, whether data sources are being used for purposes other 
than those for which they were originally designed, and the possible application of 
the Privacy Act to these initiatives. 



Legislation in the 108 th Congress 

During the 108 th Congress, a number of legislative proposals were introduced 
that would restrict data mining activities by some parts of the federal government, 
and/or increase the reporting requirements of such projects to Congress. For 
example, on January 16, 2003, Senator Feingold introduced S. 188 the Data-Mining 
Moratorium Act of 2003, which would have imposed a moratorium on the 
implementation of data mining under the Total Information Awareness program (now 
referred to as the Terrorism Information Awareness project) by the Department of 
Defense, as well as any similar program by the Department of Homeland Security. 
S. 188 was referred to the Committee on the Judiciary. 

On January 23, 2003, Senator Wyden introduced S.Amdt. 59, an amendment to 
H.J. Res. 2, the Omnibus Appropriations Act for Fiscal Year 2003. As passed in its 
final form as part of the omnibus spending bill (P.L. 108-7) on February 13, 2003, 
and signed by the President on February 20, 2003, the amendment requires the 
Director of Central Intelligence, the Secretary of Defense, and the Attorney General 
to submit a joint report to Congress within 90 days providing details about the TIA 
program. 119 Some of the information required includes spending schedules, likely 
effectiveness of the program, likely impact on privacy and civil liberties, and any 
laws and regulations that may need to be changed to fully deploy TIA. If the report 
was not submitted within 90 days, funding for the TIA program could have been 
discontinued. 120 Funding for TIA was later discontinued in Section 8131 of the 
FY2004 Department of Defense Appropriations Act (P.L. 108-87), signed into law 
on September 30, 2003. 121 

On March 13, 2003, Senator Wyden introduced an amendmentto S. 165, the Air 
Cargo Security Act, requiring the Secretary of Homeland Security to submit a report 
to Congress within 90 days providing information about the impact of CAPPS II on 
privacy and civil liberties. The amendment was passed by the Committee on 
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Commerce, Science, and Transportation, and the bill was forwarded for consideration 
by the full Senate (S.Rept. 108-38). In May 2003, S. 165 was passed by the Senate 
with the Wyden amendment included and was sent to the House where it was 
referred to the Committee on Transportation and Infrastructure. 

Funding restrictions on CAPPS II were included in section 519 of the FY2004 
Department of Homeland Security Appropriations Act (P.L. 108-90), signed into law 
October 1, 2003. This provision included restrictions on the "deployment or 
implementation, on other than a test basis, of the Computer-Assisted Passenger 
Prescreening System (CAPPSII)," pending the completion of a GAO report regarding 
the efficacy, accuracy, and security of CAPPS n, as well as the existence of a system 
of an appeals process for individuals identified as a potential threat by the system. 122 
In its report delivered to Congress in February 2004, GAO reported that 'As of 
January 1, 2004, TSA has not fully addressed seven of the eight CAPPS II issues 
identified by the Congress as key areas of interest." 123 The one issue GAO 
determined that TSA had addressed is the establishment of an internal oversight 
board. GAO attributed the incomplete progress on these issues partly to the "early 
stage of the system's development." 124 

On March 25, 2003, the House Committee on Government Reform 
Subcommittee on Technology, Information Policy, Intergovernmental Relations, and 
the Census held a hearing on the current and future possibilities of data mining. The 
witnesses, drawn from federal and state government, industry, and academia, 
highlighted a number of perceived strengths and weaknesses of data mining, as well 
as the still-evolving nature of the technology and practices behind data mining. 125 
While data mining was alternatively described by some witnesses as a process, and 
by other witnesses as a productivity tool, there appeared to be a general consensus 
that the challenges facing the future development and success of government data 
mining applications were related less to technological concerns than to other issues 
such as data integrity, security, and privacy. On May 6 and May 20, 2003 the 



Section 519 of P.L. 108-90 specifically identifies eight issues that TSA must address 
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measures to protect against unauthorized access by hackers or other intruders; 7. establish 
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Subcommittee also held hearings on the potential opportunities and challenges for 
using factual data analysis for national security purposes. 

On July 29, 2003, Senator Wyden introduced S. 1484, The Citizens' Protection 
in Federal Databases Act, which was referred to the Committee on the Judiciary. 
Among its provisions, S. 1484 would have required the Attorney General, the 
Secretary of Defense, the Secretary of Homeland Security, the Secretary of the 
Treasury, the Director of Central Intelligence, and the Director of the Federal Bureau 
of Investigation to submit to Congress a report containing information regarding the 
purposes, type of data, costs, contract durations, research methodologies, and other 
details before obligating or spending any funds on commercially available databases. 
S. 1484 would also have set restrictions on the conduct of searches or analysis of 
databases "based solely on a hypothetical scenario or hypothetical supposition of who 
may commit a crime or pose a threat to national security." 

On July 31, 2003, Senator Feingold introduced S. 1544, the Data-Mining 
Reporting Act of 2003, which was referred to the Committee on the Judiciary. 
Among its provisions, S. 1544 would have required any department or agency 
engaged in data mining to submit a public report to Congress regarding these 
activities. These reports would have been required to include a variety of details 
about the data mining project, including a description of the technology and data to 
be used, a discussion of how the technology will be used and when it will be 
deployed, an assessment of the expected efficacy of the data mining project, a privacy 
impact assessment, an analysis of the relevant laws and regulations that would govern 
the project, and a discussion of procedures for informing individuals their personal 
information will be used and allowing them to opt out, or an explanation of why such 
procedures are not in place. 

Also on July 31, 2003, Senator Murkowski introduced S. 1552, the Protecting 
the Rights of Individuals Act, which was referred to the Committee on the Judiciary. 
Among its provisions, section 7 of S. 1552 would have imposed a moratorium on 
data mining by any federal department or agency "except pursuant to a law 
specifically authorizing such data-mining program or activity by such department or 
agency." It also would have required 

The head of each department or agency of the Federal Government that engages 
or plans to engage in any activities relating to the development or use of a data- 
mining program or activity shall submit to Congress, and make available to the 
public, a report on such activities. 

On May 5, 2004, Representative McDermott introduced H.R. 4290, the Data- 
Mining Reporting Act of 2004, which was referred to the House Committee on 
Government Reform Subcommittee on Technology, Information Policy, 
Intergovernmental Relations, and the Census. H.R. 4290 would have required 

each department or agency of the Federal Government that is engaged in any 
activity or use or develop data-mining technology shall each submit a public 
report to Congress on all such activities of the department or agency under the 
jurisdiction of that official. 
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A similar provision was included in H.R. 4591/S. 2528, the Civil Liberties 
Restoration Act of 2004. S. 2528 was introduced by Senator Kennedy on June 16, 
2004 and referred to the Committee on the Judiciary. H.R. 4591 was introduced by 
Representative Berman on June 16, 2004 and referred to the Committee on the 
Judiciary and the Permanent Select Committee on Intelligence. 



Legislation in the 109 th Congress 

Data mining continued to be a subject of interest to Congress in the 109 th 
Congress. On April 6, 2005, H.R. 1502, the Civil Liberties Restoration Act of 2005 
was introduced by Representative Berman and was referred to the Committee on the 
Judiciary 126 , the Permanent Select Committee on Intelligence, and the Committee on 
Homeland Security. Section 402, Data- Mining Report, of H.R. 1502 would have 
required that 

The Head of each department or agency of the Federal Government that is 
engaged in any activity or use or develop data-mining technology shall each 
submit a public report to Congress on all such activities of the department or 
agency under the jurisdiction of that official. 

As part of their content, these reports would have been required to provide, for each 
data mining activity covered by H.R. 1502, information regarding the technology 
and data being used; information on how the technology would be used and the target 
dates for deployment; an assessment of the likely efficacy of the data mining 
technology; an assessment of the likely impact of the activity on privacy and civil 
liberties; a list and analysis of the laws and regulations that would apply to the data 
mining activity and whether these laws and regulations would need to be modified 
to allow the data mining activity to be implemented; information on the policies, 
procedures, and guidelines that would be developed and applied to protect the 
privacy and due process rights of individuals, and ensure that only accurate 
information is collected and used; and information on how individuals whose 
information is being used in the data mining activity will be notified of the use of 
their information, and, if applicable, what options will be available for individual to 
opt-out of the activity. These reports would have been due to Congress no later than 
90 days after the enactment of H.R. 1502, and would have been required to be 
updated annually to include "any new data-mining technologies." 

On June 6, 2005, S. 1 169, the Federal Agency Data-Mining Reporting Act of 
2005 was introduced by Senator Feingold, and was referred to the Senate Committee 
on the Judiciary. Among its provisions, S. 1 1 69 would have required any department 
or agency engaged in data mining to submit a public report to Congress regarding 
these activities. These reports would have been required to include a variety of 
details about the data mining project, including a description of the technology and 
data to be used, a discussion of the plans and goals for using the technology when it 
will be deployed, an assessment of the expected efficacy of the data mining project, 



126 H.R. 1502 was referred to the Subcommittee on Immigration, Border Security, and 
Claims on May 10, 2005, and later discharged by the subcommittee on July 8, 2005. 
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a privacy impact assessment, an analysis of the relevant laws and regulations that 
would govern the project, and a discussion of procedures for informing individuals 
their personal information will be used and allowing them to opt out, or an 
explanation of why such procedures are not in place. 

On July 11, 2005, H.R. 3199, the USA PATRIOT Improvement and 
Reauthorization Act of 2005 was introduced. On July 21, 2005, Representative 
Berman introduced H.Amdt. 497 to H.R. 3199, which would required the Attorney 
General to submit a report to Congress on the data mining initiatives of the 
Department of Justice and other departments and agencies as well. The provision 
stated, in part; 

The Attorney General shall collect the information described in paragraph (2) 
from the head of each department or agency of the Federal Government that is 
engaged in any activity to use or develop data-mining technology and shall report 
to Congress on all such activities. 

H.Amdt. 497 was passed on July 21, 2005 by a 261-165 recorded vote and appeared 
as Section 132 of H.R. 3199. Also on this day, H.R. 3199 was passed by the House 
and sent to the Senate. On July 29, 2005, the Senate passed an amended version of 
H.R. 3199. The Senate version did not contain a comparable provision on data 
mining. The bill went to a House-Senate conference in November 2005. Section 126 
of the conference report (H.Rept. 109-333) filed on December 8, 2005 included a 
provision for a report on data mining by the Department of Justice alone, rather than 
other departments and agencies as well. The provision stated, in part: 

Not later than one year after the date of enactment of this Act, the Attorney 
General shall submit to Congress a report on any initiative of the Department of 
Justice that uses or is intended to develop pattern-based data mining technology. . . 

The bill was signed into law as P.L. 109-177 on March 9, 2006. 

On October 6, 2005, H.R. 4009, the Department of Homeland Security Reform 
Act of 2005, was introduced by Representative Thompson, and was referred to the 
Committee on Homeland Security, the Permanent Select Committee on Intelligence, 
and the Committee on Transportation and Infrastructure. Section 203(c)(16) would 
have directed the Chief Intelligence Officer, as established in Section 203(a): 

To establish and utilize, in conjunction with the Chief Information Officer of the 
Department, a secure communications and information technology infrastructure, 
including data-mining and other advanced analytical tools, in order to access, 
receive, and analyze data and information in furtherance of the responsibilities 
under this section, and to disseminate information acquired and analyzed by the 
Department, as appropriate. 

On December 6, 2005, H.R. 4437, the Border Protection, Antiterrorism, and 
Illegal Immigration Control Act of 2005 was introduced by Representative 
Sensenbrenner and was referred to the Committee on the Judiciary and the 
Committee on Homeland Security. On December 8, 2005, the Committee on the 
Judiciary held a markup session and ordered an amended version of H.R. 4437 to be 
reported. On December 13, 2005, the Committee on Homeland Security discharged 
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the bill, which was subsequently referred to and discharged from the Committee on 
Education and the Workforce and the Committee on Ways and Means. On 
December 16, 2005, H.R. 4437 was passed by the House and sent to the Senate, 
where it was referred to the Committee on the Judiciary. 

Section 1305, Authority of the Office of Security and Investigations to Detect 
and Investigate Immigration Benefits Fraud, of H.R. 4437 would have granted the 
Office of Security and Investigations of the United States Citizenship and 
Immigration Services at the Department of Homeland Security the authority to: 

(1) to conduct fraud detection operations, including data mining and analysis; 

(2) to investigate any criminal or noncriminal allegations of violations of the 
Immigration and Nationality Act or title 18, United States Code, that Immigration 
and Customs Enforcement declines to investigate; 

(3) to turn over to a United States Attorney for prosecution evidence that tends 
to establish such violations; and 

(4) to engage in information sharing, partnerships, and other collaborative efforts 
with any — 

(A) Federal, State, or local law enforcement entity; 

(B) foreign partners; or 

(C) entity within the intelligence community (as defined in section 3(4) of the 
National Security Act of 1947 (50 U.S.C. 401a(4)). 

On July 12, 2006, Senator Feingold introduced S.Amdt 4562 to H.R. 5441, the 
Homeland Security Department FY2007 appropriations bill. S.Amdt. 4562 is 
substantively similar to S. 1 169, although only applies to departments and agencies 
within the Department of Homeland Security, rather than the entire federal 
government. S.Amdt. 4562 was agreed to by unanimous consent and was included 
in the Senate-passed version of H.R. 5441 as Section 549. According to the 
conference report (H.Rept. 109-699) Section 549 was deleted from the final bill that 
was passed into law (P.L. 109-295). 127 However, the conference report also included 
a statement on data mining by the conference managers expressing concern about the 
development and use of data mining technology and; 

"direct[s] the DHS Privacy Officer to submit a report consistent with the terms 
and conditions listed in section 549 of the Senate bill. The conferees expect the 
report to include information on how it has implemented the recommendation 
laid out in the Department's data mining report received July 18, 2006." 128 



Legislation and Hearings in the 110 Congress 

Data mining has been the subject of some of the earliest proposed bills and 
hearings of the 110 th Congress. On January 10, 2007, S. 236, the Federal Agency 
Data-Mining Reporting Act of 2007 was introduced by Senator Feingold and Senator 



127 See p. 180. 

128 Ibid., p. 1 17. The DHS Privacy Office delivered the requested report to Congress on July 
6, 2007. A copy of the report is available at [http://www.dhs.gov/xlibrary/assets/privacy/ 
privacy_rpt_datamining_2007.pdf] . 
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Sununu, and was referred to the Senate Committee on the Judiciary. Among its 
provisions, S. 236 would require any department or agency engaged in data mining 
to submit a public report to Congress regarding these activities. These reports would 
be required to include a variety of details about the data mining project, including a 
description of the technology and data to be used, a discussion of the plans and goals 
for using the technology when it will be deployed, an assessment of the expected 
efficacy of the data mining project, a privacy impact assessment, an analysis of the 
relevant laws and regulations that would govern the project, and a discussion of 
procedures for informing individuals their personal information will be used and 
allowing them to opt out, or an explanation of why such procedures are not in 
place. 129 

Also in the Senate, the Committee on the Judiciary held a hearing on January 
10, 2007 entitled "Balancing Privacy and Security: The Privacy Implications of 
Government Data Mining Programs." The witnesses included a former Member of 
Congress and several individuals from research centers and think tanks . Collectively, 
they highlighted a number of perceived strengths and weaknesses of data mining, as 
well as the continually evolving nature of the technology and practices behind data 
mining. 130 The witnesses also addressed the inherent challenge of simultaneously 
protecting the nation from terrorism while also protecting civil liberties. 

On February 28, 2007, Senator Reid introduced S.Amdt. 275 to S. 4 the 
Improving America' s Security by Implementing Unfinished Recommendations of the 
9/1 1 Commission Act of 2007. Section 504 of this amendment, entitled the Federal 
Agency Data Mining Report Act of 2007, was identical to S. 236, as introduced. 
During the Senate floor debates held on S. 4 in early March 2007, several 
amendments to the data mining section of S. 4 were introduced. 

On March 6, 2007, Senator Kyi introduced S.Amdt. 357 to S.Amdt. 275 of S. 

4. The purpose of S.Amdt. 357 was described as "to amend the data-mining 
reporting requirement to protect existing patents, trade secrets, and confidential 
business processes, and to adopt a narrower definition of data mining in order to 
exclude routine computer searches." 131 Later on March 6, 2007, Senator Kyi offered 
a modification to S.Amdt. 357 that used definitions of data mining and database very 
similar to those that appear in P.L. 109-177 the USA PATRIOT Improvement and 
Reauthorization Act of 2005, and that slightly changed the original language of 

5. Amdt. 357 regarding protection of patents and other proprietary business 
information. 



129 On April 12, 2007, the Committee voted to approve a revised version of S. 236, which 
was sent to the full Senate. A description of this version of the bill is discussed later in the 
chronology of this section of the report. 

130 Witnesses testifying at the hearing included former Representative Robert Barr of Liberty 
Strategies, LLC; James Carafano of the Heritage Foundation; Jim Harper of the CATO 
Institute; Leslie Harris of the Center for Democracy and Technology; and Kim Taipale of 
the Center for Advanced Studies in Science and Technology. 

131 Congressional Record, vol. 153, March 6, 2007, p. S2670. 
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On March 8, 2007, Senator Feingold introduced S.Amdt. 429 to S.Amdt. 275. 
S.Amdt. 429 is very similar to S. 236, as introduced, with a few differences. One 
difference is that the initial description used to partially define data mining is 
changed to include "a program involving pattern-based queries, searches, or other 
analyses of 1 or more electronic databases...." Another difference is that the data 
mining reporting requirement excludes data mining initiatives that are solely for "the 
detection of fraud, waste, or abuse in a Government agency or program; or the 
security of a Government computer system." 132 Another difference is the inclusion 
of language requiring that the data mining reports be "produced in coordination with 
the privacy officer of that department or agency." 133 S.Amdt. 429 also includes 
language detailing the types of information that should be included in the classified 
annexes of the data mining reports (i.e., classified information, law enforcement 
sensitive information, proprietary business information, and trade secrets), and states 
that such classified annexes should not be made available to the public. 

Later on March 8, 2007, Senator Feingold introduced S.Amdt. 441 to S.Amdt. 
357. S.Amdt. 441 is substantively the same as S.Amdt. 429, but with a technical 
modification. 

On March 13, 2007, S.Amdt. 441 was agreed to by unanimous consent, and 
S.Amdt. 357, as modified, and as amended by S.Amdt. 441 was agreed to by 
unanimous consent. Also on March 13, 2007, S. 4 passed the Senate by a 60-38 vote. 
The data mining provision appears as Section 604 in S. 4. As originally passed by 
the House in January 2007, the House version of S. 4, H.R. 1, did not contain a 
comparable provision on data mining. 

On March 21, 2007, the House Committee on Appropriations Subcommittee on 
Homeland Security held a hearing entitled "Privacy and Civil Rights in Homeland 
Security." The witnesses included Hugo Teufel III, the Chief Privacy Officer at 
DHS ; Daniel Sutherland of the Office of Civil Rights and Civil Liberties at DHS ; and 
the Government Accountability Office (GAO). Collectively they addressed some of 
the data mining activities being carried out by DHS, in particular the use of the 
Analysis, Dissemination, Visualization, Insight, and Semantic Enhancement 
(ADVISE) data mining tool, and the precautions taken by DHS to protect citizens' 
privacy and civil liberties. 

On April 12, 2007, the Senate Committee on the Judiciary voted to approve a 
revised version of S. 236, the Data Mining Act of 2007. On June 4, 2007, the 
Committee reported the bill. With one exception, this revised version of S. 236 is 
substantively identical to data mining provision passed as Section 604 in S. 4, and 
later as Section 804 of P.L. 1 10-53 in July 2007 (discussed below). As passed by the 
Committee, S. 236 includes a provision regarding penalties for the unauthorized 
disclosure of classified information contained in the annex of any reports submitted 
to Congress. 



132 Congressional Record, vol. 153, March 8, 2007, p. S2949. 

133 Ibid. 
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On June 15, 2007, the House of Representatives passed H.R. 2638, concerning 
FY2008 appropriations for the Department of Homeland Security. The 
accompanying House Report (H.Rept. 110-181) includes language prohibiting 
funding for the Analysis, Dissemination, Visualization, Insight, and Semantic 
Enhancement (ADVISE) data mining program until DHS has completed a privacy 
impact assessment for the program. ADVISE is alternatively described as a 
technology framework, or a tool, for analyzing and visually representing large 
amounts of data. ADVISE is being developed by the Directorate for Science and 
Technology at DHS. The accompanying Senate Report (S.Rept. 110-84) for S. 1644, 
concerning FY2008 DHS appropriations, also includes similar language 
recommending that no funding be allocated for ADVISE until a program plan and 
privacy impact assessment is completed. 

On July 9, 2007, the Senate took up H.R. 1, struck all language following the 
enacting clause, substituted the language of S. 4 as amended, and passed the bill by 
unanimous consent. Differences between H.R. 1 and S. 4 were resolved in 
conference later that month. The data mining provision that appeared as Section 604 
in S. 4 was retained as Section 804 in the agreed upon bill. On July 26, 2007, the 
Senate agreed to the conference report (H.Rept. 1 10-259) in a 85-8 vote. On July 27, 
2007, the House agreed to the conference report in a 371-40 vote. On August 3, 
2007, the bill was signed into law by the President as P.L. 1 10-53. 
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